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Abstract 

We  explore  the  use  of  a  shoe-mounted  camera  as  a  sen¬ 
sory  system  for  wearable  computing.  We  demonstrate  tools 
useful  for  gait  analysis ,  obstacle  detection,  and  context 
recognition.  Using  only  visual  information,  we  detect  pe¬ 
riods  of  stability  and  motion  during  walking.  In  the  stable 
phase,  the  foot  can  be  assumed  to  be  parallel  to  the  ground 
plane.  In  this  condition,  the  floor  dominates  the  lower  part 
of  the  camera ’s  view,  and  we  show  that  it  can  be  segmented 
out  from  the  remainder  of  the  scene,  leaving  walls  and  ob¬ 
stacles.  We  also  demonstrate  floor  surface  recognition  for 
context  awareness. 


1.  Introduction 

Costs  for  digital  cameras  and  computation  continue  to  be 
driven  lower  by  technological  advances  and  strong  demand. 
Future  wearable  computing  system  can  benefit  from  these 
trends  by  applying  cameras  to  new,  more  specialized,  and 
less  traditional  sensing  tasks.  In  this  paper,  we  explore  the 
use  of  a  shoe-mounted  camera  for  gait  analysis,  obstacle 
detection,  and  context  recognition. 

Wearable  computing  on  shoes  has  been  used  for  a  variety 
of  purposes,  including  user  interfaces  [4],  power  production 
[7],  and  gambling  [8].  We  show  that  visual  processing  can 
also  benefit  from  this  prime  location.  The  planted  foot  is  the 
only  part  of  the  body  that  is  reliably  stationary  with  respect 
to  the  world  during  walking  and  standing.  When  we  walk, 
our  feet  come  into  contact  with  the  ground  in  an  alternat¬ 
ing  pattern.  Each  foot  swings  swiftly  through  the  air,  then 
is  pressed  against  the  ground  as  the  weight  of  the  body  is 
transferred  onto  it  [5].  During  these  key  moments  within  a 
person’s  stride,  the  planted  foot  tends  to  be  in  a  canonical 
orientation  with  respect  to  the  floor  and  relatively  motion¬ 
less,  which  leads  to  simplified  vision  processing. 
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Figure  1.  The  system.  A  camera  and  inertial 
sensor  are  mounted  on  a  sandal. 


In  this  paper,  we  analyze  the  wearer’s  gait  and  pick  out 
frames  corresponding  with  moments  of  stability  when  the 
foot  is  pressed  against  the  floor.  During  these  moments,  the 
floor  dominates  the  lower  part  of  the  camera’s  view,  and  we 
show  that  the  floor  can  be  segmented  out  from  the  remainder 
of  the  scene,  leaving  walls  and  obstacles.  Next,  we  demon¬ 
strate  floor  surface  recognition  for  context  awareness.  We 
conclude  by  speculating  about  the  role  a  foot-mounted  cam¬ 
era  could  play  in  future  wearable  systems. 

2.  The  platform 

Our  system  consists  of  a  camera  and  three  inertial  sen¬ 
sors.  The  camera  is  mounted  at  the  very  front  of  a  sandal  as 
shown  in  Figure  1.  It  is  rigidly  attached  to  an  inertial  sensor. 
The  remaining  two  inertial  sensors  are  attached  to  the  leg  as 
shown  -  however  they  do  not  play  a  role  in  the  component 
of  this  project  described  here.  Data  is  logged  on  a  laptop 
carried  in  a  backpack.  The  system  was  tested  on  two  floors 
of  a  building,  on  eight  different  surfaces  (see  Figure  6). 
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Figure  2.  This  figure  shows  a  sequence  of  frames  taken  during  a  single  step.  In  that  step,  the  wearer 
moves  from  a  lobby  area  into  a  corridor,  and  from  a  blue  to  red  carpet.  The  main  swing  phase  of  the 
step  occurs  in  frames  five  to  nine. 


3.  Gait  analysis 

When  the  foot  is  pressed  against  the  ground,  the  cam¬ 
era  is  in  a  fixed  orientation  with  respect  to  the  ground 
plane,  and  so  this  is  an  ideal  opportunity  for  visual  pro¬ 
cessing.  Figure  2  shows  images  recorded  during  a  single 
step.  In  the  initial  part  of  the  swing  phase,  the  floor  in  these 
frames  becomes  blurred  (low  spatial  derivative  A Jx),  the 
view  changes  rapidly  (high  temporal  derivative  A /*),  and 
the  foot  turns  downwards  towards  the  floor  (low  average  lu¬ 
minance  /o).  Bach  of  these  cues  could  be  used  individually 
to  identify  the  swing  phase  of  walking.  We  combine  them 
for  robustness  into  a  single  measure  s. 
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Typically,  during  the  swing  phase,  the  summed  mag¬ 
nitude  of  the  temporal  derivative  of  the  pixels  in  the  im¬ 
age  increase  dramatically,  and  the  summed  magnitude  of 
the  spatial  derivative  falls  because  blurring  removes  high- 
frequency  edges.  The  spatial  derivative  is  normalized  for 
the  overall  average  luminance  since  this  changes  as  the  cam¬ 
era  moves  from  pointing  towards  the  floor  to  pointing  to¬ 
wards  the  ceiling  (and  lights).  Since  the  wearer  may  move 
between  different  surfaces,  the  average  component  of  A Ix 
and  Io  is  removed  using  a  running  average. 

Plots  of  these  measurements  for  the  step  in  Figure  2 
are  shown  in  Figure  3.  As  the  step  begins,  the  temporal 
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Figure  3.  Visual  and  orientation  information 
recorded  during  a  single  step  (as  shown  in 
2).  The  stride  begins  as  the  orientation  value 
goes  negative. 


derivative  increases  (due  to  motion),  the  spatial  derivative 
decreases  (due  to  blur),  and  the  mean  luminance  tends  to 
fall  (due  to  the  camera  looking  towards  the  floor).  Pitch 
information  from  the  inertial  sensor  attached  to  the  camera 
is  also  shown.  We  used  this  as  an  independent  measure  to 
verify  the  visual  gait  analysis.  Whether  the  foot  is  in  full 
swing,  standing,  or  in  an  intermediate  state  is  determined 
by  analyzing  s.  A  transition  between  steps  is  assumed  to 
occur  whenever  this  drops  and  rises  again  by  at  least  5%  of 
its  maximum  range.  To  demonstrate  this,  a  longer  walking 
sequence  is  shown  in  Figure  4.  Periods  of  stability  detected 
from  the  gait  analysis  are  processed  to  achieve  floor  seg¬ 
mentation  and  recognition. 


Figure  5.  Floor  segmentation  in  action.  The  top  row  shows  original  images,  the  second  row  shows 
masks  corresponding  to  the  floor,  and  the  bottom  row  overlays  the  two. 
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Figure  4.  Visual  and  orientation  information 
recorded  while  walking  down  a  corridor.  The 
stride  period  is  visible  in  all  modalities. 


4  Floor  segmentation 


to  collect  a  significant  sample  of  the  appearance  of  the  floor 
and  non-floor  parts  of  the  image.  With  recognition,  we  are 
able  to  reliably  sample  from  the  floor  without  the  need  for 
any  floor  detection  algorithms. 

These  images  are  well  suited  to  wearable  computing  ap¬ 
plications  that  benefit  from  detailed  sensing  of  the  wearer's 
nearby  environment  (for  example,  detection  of  walking  haz¬ 
ards  [3, 6]).  Segmenting  the  floor  in  the  image  can  serve  as  a 
first  step  to  analyzing  the  free  space,  objects,  and  obstacles 
close  to  the  wearer,  out  to  about  6  feet  with  our  wide  angle 
lens.  We  show  example  results  from  our  floor  segmentation 
algorithm  in  Figure  5.  If  we  assume  flat  floors,  we  can  con¬ 
struct  a  function  that  for  each  pixel  gives  the  distance  from 
the  toe  to  the  corresponding  point  on  the  floor  [2],  which 
could  be  useful  for  object  avoidance. 

The  top  quarter  and  bottom  quarter  of  the  image  are  used 
to  initialize  two  probabilistic  appearance  models,  one  for 
the  floor  and  one  for  the  non-floor  parts  of  the  image.  These 
two  appearance  models  and  the  resulting  segmentation  are 
iteratively  optimized  using  EM  (expectation  maximization) 
to  find  a  maximum  likelihood  segmentation  of  the  floor  and 
non- floor.  The  segmentation  is  constrained  to  be  a  set  of 
radial  distances  emanating  from  the  center  of  the  bottom  of 
the  image. 


Due  to  the  canonical  orientation  of  the  stable  images  se¬ 
lected  by  our  gait  analysis,  we  know  with  high  probability 
that  the  bottom  quarter  of  the  image  is  floor  and  that  the 
top  quarter  of  the  image  is  not  floor.  The  bottom  quarter  of 
these  special  images  corresponds  with  the  area  from  the  toe 
out  to  3  inches  on  the  floor.  The  top  quarter  of  these  images 
is  very  far  above  the  horizon  line.  In  this  section  and  the 
next,  we  exploit  this  property  to  perform  segmentation  and 
recognition.  With  segmentation,  this  observation  allows  us 


5.  Recognition 

Areas  with  different  functions  often  have  distinct  floor 
surfaces  (see  Figure  6).  For  example,  a  wash  room  floor 
is  unlikely  to  be  carpeted  so  that  it  can  be  easily  mopped. 
Floor  recognition  is  therefore  a  valuable  cue  for  localization 
and  context  awareness  [1]. 

We  have  used  camera  placement  to  essentially  solve  the 
problem  of  floor  detection,  as  detailed  in  Sections  3  and  4.  If 


Figure  6.  Floors  encountered.  Four  are  car¬ 
peted,  four  are  not.  The  floors  are  drawn  from 
corridors,  an  office,  lab  space,  a  kitchen  area 
and  a  wash  room.  There  is  considerable  vari¬ 
ety  in  color  and  texture. 


classification  frequencies  for  floor  samples 
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Table  1.  Confusion  matrix  for  floor  recogni 
tion. 


we  model  the  appearance  of  the  lower  part  of  the  image,  this 
will  be  dominated  by  the  floor.  As  a  proof  of  concept,  we 
simply  compute  the  average  color  of  this  part  of  the  image 
(normalized  for  luminance)  and  use  it  to  represent  the  floor. 
We  took  training  samples  from  eight  qualitatively  different 
floors  in  our  building.  We  tested  a  large  set  of  other  samples, 
comparing  them  with  the  models  using  a  simple  Euclidean 
distance  metric.  The  confusion  matrix  is  shown  in  Table  1 . 
A  total  of  86.3%  of  the  classifications  are  correct.  A  classi¬ 
fier  that  always  guessed  “floor  number  1”  (the  most  frequent 
case  in  the  data)  would  have  a  44%  success  rate.  The  num¬ 
ber  of  samples  of  each  floor  are  different  since  the  data  was 
collected  by  simply  walking  around,  and  the  areas  covered 
by  the  different  floor  types  are  of  different  sizes.  The  great¬ 
est  confusion  present  is  between  two  similar  reddish-hued 
carpets  (floors  number  7  and  8). 

6.  Discussion  and  conclusions 

We  have  shown  that  a  foot-mounted  camera  is  well 
placed  for  a  number  of  sensory  tasks  relevant  to  wearable 


computing  applications.  Specifically,  we  have  presented 
methods  and  results  for  gait  analysis,  floor  segmentation, 
and  floor  recognition  based  solely  on  images  from  the  cam¬ 
era. 

In  general,  as  cameras  and  computation  become  less 
costly  we  expect  for  more  specialized  camera  sensing,  such 
as  this,  to  become  practical  for  wearable  computing.  Is¬ 
sues  of  privacy  and  misuse  could  be  mitigated  by  making 
a  closed  sensory  system.  A  camera  on  each  foot  would 
make  several  applications  easier  by  allowing  for  nearly  un¬ 
interrupted  acquisition  of  stable  images.  Several  interesting 
future  applications  might  be  built  on  top  of  the  results  we 
have  presented  including  automated  cartography,  localiza¬ 
tion,  detection  of  nearby  people  by  their  feet  and  legs,  and 
recognizing  common  nearby  objects  such  as  chairs,  tables, 
walls  and  trash  cans,  extensions  to  outdoor  terrain,  and  more 
powerful  floor  recognition  systems. 
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